ZK | ⌬ arabic-concordances-explorer

⛁ arabic-concordances-explorer
◊ data-driven learning applied to learning Arabic
◊ concordance-based tools are better done offline (→ godot), because you often need large amounts of data...

add_post_translation_note_to_concordance_expl.mp4

Possible Sources

Sinai Corpus : the first thing you find, but weird encoding, and I'm not sure how to use it (sparsely documented)
https://github.com/linuxscout/tashkeela2/blob/master/data/Interviews/Int07.xml seems like a decent amount of data (diverse topics), but it's spread out across folders and xml files which would need cleaning and merging
1800 Tweets , Jordanian or MSA, but in xlsx
Arabic wikipedia , comes with dump, corpus and instructions
undocumented zip corpus
Arabic Big Corpus , which is not actually very big and may be exclusively Qran